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ABSTRACT 

The 1992-93 school year marked the first year of 
implementation of the statewide mandate of the Arizona Student 
Assessment Program (ASAP) , which revised previous legislation. Iowa 
Test of Basic Skills testing requirements were restricted to grades 2 
and 7, and districts were allowed greater flexibility in their own 
testing, which had previously been almost exclusively through 
cr i teri on-referenced tests. Performance assessments were supported by 
the new legislation, and teachers generally saw the ASAP as a 
low-stakes assessment in line with educational trends. The 
implementation of the ASAP and its changes were studied in four 
elementary schools during the first implementation year in a multiple 
case-study design with various data collection methods. Results 
indicate that local responses to the ASAP were varied and that 
differences in implementation were significant. Common among the 
sites was the belief that testing from an outside agency is still 
separate fr om instruction and is an add— on to normal school 
operation. An appendix contains a cross-site data matrix. (Contains 
14 references.) (SLD) 
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WHAT HAPPENS WHEN THE TEST MANDATE CHANGES? 
RESULTS OF A MULTIPLE CASE STUDY 1 

Mary Lee Smith, Audrey J* Noble, Marilyn Cabay, 
Walt Heinecke, M. Susan Junker, and Yvonne Saffron 

CRESST/Arizona State University 



Introduction 

The academic year 1992 1993 marked the first year of implementation of 
the statewide mandate known as the Arizona Student Assessment Program 
(ASAP), which was authorized by the Arizona Revised Statutes 15-741-744 of 
1990. This bill revised previous legislation, which had mandated testing every 
pupil every spring on the Iowa Tests of Basic Skills (ITBS) in Grades 2-8 and 
the Test of Academic Performance (TAP) in Grades 9-11. The mandate also 
included the requirement that districts develop and administer tests to 
determine if schools were meeting the Arizona Essential Skills, the statewide 
curriculum framework. ASAP reduced the ITBS testing requirements to 
Grades 2 and 7 (and TAP to 11) and moved the testing date to the fall. District 
testing, which heretofore had been almost exclusively by criterion-referenced 
methods, was allowed greater flexibility: Districts could continue with CRTs, 
use portfolio assessments, or administer and locally score the new 
performance assessments using Forms A, B, and C. Form D was designed to 
be the on-demand or audit form of the performance assessment. It was 
administered during March to pupils in Grades 3, 8 and 12, with standardized 
administration rules and procedures. Rubrics for scoring the performance 
test were used, at central scoring sites, by teachers trained by state officials 



1 This work was also reported in a paper presented at the annual meeting of the American 
Educational Research Association, New Orleans, Louisiana, April 7, 1994. 
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and representatives of the test developers, Riverside Press and Measurement, 
Inc. Scores were reported by student to schools and districts, and by school 
and district to general audiences. District average scores were one part of a 
state-required Report Card, also including ITBS and district test scores, all 
referenced to the Arizona Essential Skills. That is, .each district had to submit 
a District Assessment Plan (DAP), specifying a mastery level on each of the 
Essential Skills and reporting the percentage of pupils who had attained that 
level, as indicated by the collection of assessment results. Although ASAP 
included all these components (performance assessment Form D, ITBS/TAP, 
district testing, DAP, School Report Cards), most people used the term "ASAP" 
to refer only to the performance assessment itself. 

Like any mandate, ASAP was designed to solve what policy makers 
perceive to be a problem. The perceived solution to the problem lies in 
requiring some uniform action on the part of its agents (McDonnell & Elmore, 
1987). In the case of ASAP, at least two categories of problem were in the 
minds of policy makers. In Noble (1994) and Noble and Smith (in press), we 
reported results of a policy study in which the images and beliefs of policy 
makers and state officials instrumental in the ASAP mandate were examined. 
Some of these individuals conceived of ASAP as a means of improving Arizona 
schools by moving them toward a more ambitious and integrated form of 
curriculum and pedagogy; that is, toward holistic teaching and higher order 
thinking or cognitive-constructivist learning. Others, however, conceived of 
ASAP as a means of making schools more accountable for achievement 
results, specifically focusing the school's attention more intensively on the 
Arizona Essential Skills. 

According to officials of the Arizona Education Association and data 
collected early in 1992-1993, most teachers considered ASAP to be benign, 
supportive of educational trends, and a low-stakes assessment. Just as 
teachers thought of "ASAP" as equated with the performance assessment, they 
also interpreted the mandate as improving instruction toward holism and 
cognitive-constructivism. This interpretation was supported by initial 
information teachers received in state and regional conferences and training 
sessions run by the Arizona Department of Education (ADE). A pilot 
administration, conducted during academic year 1991-1992, reinforced this 
view. Most teachers who examined the pilot test material or participated in 
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the pilot administration seemed to think that ASAP was "a step in the right 
direction." By this, they meant that the performance test was a substantial 
improvement over the ITBS and supported a variety of instructional practice 
that they appreciated. As the findings in this report will show, this view was 
repeatedly challenged over the 1992-1993 year. At the end of the year, the ADE 
published district test results, and the newspaper distributed them, adding 
editorial comments about the failings of public schools. The ADE 
administrators used ASAP results for the same purpose, which altered many 
of the teachers' views about the function of ASAP. 

Conceptual Context of the Study 

The study reported herein is part of a larger project, "What Happens 
When the Test Mandate Changes?" The project encompasses three years of 
data collection on the consequences in Arizona of the implementation of ASAP. 
Several levels of anal: ^is are covered in the project as a whole. The policy 
study (Noble, 1994) analyzes the images, beliefs, and values of policy makers 
and administrators as they reflect on the policy change, its antecedents and 
consequences. The present study addresses the consequences of the change in 
mandate in four Arizona elementary school ?. during the first year of 
implementation. During academic year 1993-1994 (the second year of policy 
implementation), we are extending the findings and testing the models 
through focus group interviews and survey methods. The report of the project 
as a whole will focus on the interplay of policy and practice over two years of 
policy implementation and local reactions. 

Focusing on the interplay of policy and practice is a decision that comes 
from our conceptual framework. We drew on Rein (1983) and Weatherly and 
Lipsky (1978) for ideas about where to look for evidence about the effects of 
school policy making. From reading these works, we were committed to the 
idea that definitions of the situation (the images of the problems a given policy 
should solve as well as the characteristics of pupils, teachers, curriculum, 
assessment, and educational change) held by policy makers and shapers are 
translated imperfectly by practitioners. Teachers and principals redefine and 
reinterpret the messages about policy that they receive. They then act— adapt, 
teach, learn, evaluate— according to their own definitions of the situation 
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(Blumer, 1986). This study, therefore, is symbolic interactionist in conceptual 
framework and interpretivist in research methodology. 

Specifically, we draw on Erickson (IS 86), as well as Miles and Huberman 
(1984) for our research methods. That is, to understand action and practice, 
we believe that the researcher *r>ust engage directly in the local scene, spend 
sufficient time to understand action in its specific social context and gain 
access to participant meanings, and show how these meanings-in-action 
evolve over time. Without careful grounding in local cases, a more general 
understanding is impossible. 

This study also draws on previous research on the role of mandated 
testing. An earlier qualitative study (Smith, Edelsky, Draper, Rottenberg, & 
Cherland, 1990) showed that the previous test mandate in Arizona, which 
involved the high-stakes use of the ITBS, had effects such as narrowing 
curriculum, promoting test-like instructional methods, reducing time for 
ordinary instruction, deskilling and demoralizing teachers, and leading to 
inappropriate test preparation practices. A review of related research (Smith, 
1993) showed that similar effects have been experienced in other states and 
settings having high-stakes accountability programs. The question 
unanswered by extant research is whether assessments that differ in form 
from the traditional, norm- or criterion-referenced standardized tests would 
produce similar reactions and effects. 

Proponents of performance assessment believe that what is assessed is 
what gets taught. Therefore, the argument goes, mandating an assessment 
that requires integrated curriculum (e.g., reading and math) and higher 
order thinking and problem solving on the part of pupils will drive schools and 
teachers to align their offerings so that pupils will be able to perform 
adequately (cf. Resnick, 1989). This is the essence of measurement-driven 
reform: that building a better test will drive schools toward more • ■. itious 
goals and reform them toward a curriculum and pedagogy ge? more 
toward thinking and less toward rote memory and isolated skills : .ie shift 
from behaviorism to cognitive-constructivism. The present study .^presents 
an attempt to understand what happens during the initial year of 
implementation of such an assessment, which state officials have termed "the 
best we know about assessment and pupil learning." 
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Methods of the Study 

The research design chosen to address this issue is the multiple-case- 
study design (Miles & Huberman, 1984). This design is based on the rationale 
that understanding complex organizations such as schools requires long-term 
and close-up examination of local practice within bounded social settings. The 
actions of participants faced with a new government mandate can only be 
understood in the specific context in which they occur and referenced to the 
meanings held by those participants. The researcher aiming to understand 
these meanings must have access, over an extended period of time, to the 
classrooms and offices in which participants' definitions of the situation 
(mandated assessments, in this case) evolve and get worked out in actions. Do 
they actually provide the type of instruction geared to the ASAP performance 
test? Do they have the knowledge they need to adapt, or do they have the 
intention to do so? What is the meaning of the ASAP to teachers and others in 
schools? Getting evidence to answer questions such as these requires more 
than snapshot observations and prespecified questionnaire items. Thus, the 
qualitative case study is the best design. The decision to do more than one case 
study was not made because four is closer to the population of schools than 
one. Nor is there any intent to evaluate the four schools comparatively. The 
rationale for drawing multiple cases is that one case provides interpretive 
context for the others. A case study researcher typically immerses herself in a 
single site and tries to understand everything there is to know about it. 
Holistic understanding, however, sometimes produces the holistic fallacy. 
Things unobserved in that setting are often not considered as salient; observed 
phenomena and events may be mistakenly seen as causal. Seeing two case 
studies in parallel can alert the two researchers of features taken for granted 
or overlooked in one. In the present study, for example, the influence of the 
district's philosophical support of ASAP was overlooked by the researcher in 
her within-case analysis. Simply because it was taken for granted by everyone 
in the site, she failed to observe the potential influence of this condition. Yet 
when her case was held up against another site, in which the district 
administration was not supportive of the mandate, the importance of the factor 
in explaining the relative success of the mandate in the two sites became 
obvious. 
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Four cases were chosen for the study. The number was determined by the 
resources available to support four graduate students for the year. Only 
elementary schools were chosen, because of the need to contrast the effects of 
the new mandate with the previous one studied by Smith et al. (1990). The 
decision of which sites to select was made based on the desirability of varying 
cases across economic and social resources and prior history of testing 
demand (the importance of test results historically in the district). Thus we 
tried to find schools with greater and lesser economic resources, serving 
advantaged and disadvantaged students, and located in urban, rural, and 
suburban settings. In addition, we made use of contacts and acquaintances 
that would help us to access particular schools and districts. 

All schools we contacted and requested permission to study responded 
positively. The four sites where we conducted case studies were (a) Valor, a 
rural school with a low resource base, serving mostly poor and minority pupils 
in a K-8 district; (b) Franklin, an urban school with a relatively high resource 
base, serving mostly poor and minority pupils in a K-8 district; (c) Pines, a 
suburban school with an ethnically and economically diverse student body, in 
a large, K-8, resource-advantaged district with high test demand 
characteristics; and (d) Hilldale, a suburban school serving mostly Anglo and 
advantaged pupils, in a large, K-12, resource-advantaged district with 
moderate test demand characteristics. Additional information on the 
descriptive characteristics of the fou * sites is available in the case studies 
themselves and summarized in the Cross-Site Data Matrix (see Appendix). 
All names used in the study are pseudonyms. District and school personnel 
were promised confidentiality. 

Five researchers were selected to conduct the case studies. Audrey Noble, 
assigned to Valor, is a fourth-year graduate student in the doctoral program in 
educational leadership and policy studies. In addition to her case study, she 
acted as research coordinator for the others. Suzii Junker, a third-year 
student in the doctoral program in reading, conducted the study at Hilldale, 
Walt Heinecke, a third-year student in the doctoral program in educational 
leadership and policy studies, studied Pines. Marilyn Cabay and Yvonne 
Saffron collaborated on the study at Franklin. Cabay and Saffron are fourth- 
year students in the doctoral program in school psychology. All five of the 
researchers had at least two courses in qualitative research at the time of the 
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study and had produced independent studies as part of their degree programs. 
All are highly experienced in various educational roles: classroom teacher, 
counselor, school administrator, school psychologist, testing coordinator. All 
five brought unique perspectives to their research role; yet consistency across 
researcher perspectives was maintained in several ways. First, a common 
design for data collection and common definitions of researcher roles were 
shared. Second, the theoretical framework focused researchers' attention on 
common aspects of the sites (the images held by the participants of pupil, 
teacher, learning, curriculum, assessment, and school structure). Third, 
monthly meetings of the researchers were held to address issues raised and 
problems at the separate sites, share memos and working papers, and the like. 
Fourth, the work of the researchers was supervised by Mary Lee Smith, who 
monitored the adequacy of data collection and analysis procedures. Finally, 
drafts of the four case studies were read by all members of the research team, 
and reactions were incorporated into the case studies by the researchers to add 
to the overall fit of the cases together and provide the interpretive context of 
each case to the others. 

Data Collection 

Each case study involved the following data collection methods. The unit 
of study was defined as the classroom within the school. The four 
participating schools provided the researchers with access to faculty meetings 
and other school events, direct observation of one third-grade and one fourth- 
grade class (except for Hilldale Elementary, in which a combined third/fourth- 
grade class was the primary participant), interviews with third-grade and 
fourth-grade teachers, and documents relevant to ASAP, curricula, and local 
testing programs. This access extended through the academic year 1992-93. 
Informal contact between researchers and participating teachers was 
maintained through 1993. The choice of third- and fourth-grade classes was 
based on the state mandate of ITBS testing in fourth grade during the month of 
October and ASAP performance testing in third grade in March. The design 
of observations followed from this schedule, with observation occasions 
clustered in the fourth-grade classes in the fall and the third-grade classes in 
the spring. The working design called for researchers to be in the targeted 
classrooms one day each week normally and twice per week immediately 
before, during, and after the testing events. They deviated from the schedule 
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when necessary to capture activities relevant to the research questions in the 
rest of the school or district. For example, the researcher at Hilldale 
accompanied the teacher whose class she usually observed when the teacher 
attended a training session on scoring of the performance test. The researcher 
at Valor branched out to classes other than the one chosen in the design so that 
she could understand the relative authority of teachers, principal, and district 
officials in determining curriculum choices. 

The researchers played the role of "more observer-than-participant" 
(Gold, 1958), developing cordial, nonevaluative, and trusting relationships 
with the teachers and school staff. No problems with access were experienced 
at the schools over the year's data collection. However, project policy about 
confidentiality and ownership of the data had to be clarified and reiterated with 
officials in one of the districts. Our position was to maintain confidentiality 
and protection of the identity and perspectives of the participants with whom 
we dealt most directly — the teachers and principals. District officials would 
have access to only those data either that shielded the identity of the 
participants or that the participants had cleared for publication. 

Observation occasions of school and classroom activities v/ere aimed at 
understanding the role of testing in context, the meaning of mandated testing 
to teachers and school staff, test preparation for mandated tests, and the 
relationship of mandated testing to curriculum, pedagogy, and school 
structures. The conceptual framework of the study provided the focus for 
observations. That is, the researchers kept in mind the need to attend to, 
besides the normal, everyday life of the classrooms, incidents that shed light 
on the images held by participants of pupil, teacher, learning, assessment, 
and school structure. Researchers kept detailed notes of what they observed, 
transcribed their working notes, and submitted the write-ups in text files to the 
research coordinator. These were reviewed periodically to make sure the 
researchers were preserving the necessary level of concrete detail and 
recording material relevant to the research questions and conceptual 
framework. Monthly meetings of the researchers were held to coordinate 
insights and keep everyone on target. 

By design, the researchers conducted formal interviews with the 
principal and teachers whom they observed and focus group interviews with 
remaining third- and fourth-grade teachers in the school. In addition, 
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interviews with district officials were conducted to understand the district 
perspectives on assessment and the organizational climate of the districts. 
The interview agenda and key questions and probes were developed by the 
research director and coordinator to generate data according to the conceptual 
framework. For example, teachers were asked questions such as: "The state 
believes that the new testing program will promote a new kind of instruction. 
Other than knowing what the test covers and how to administer it, what are 
the things a teacher needs to know to teach in the manner that ASAP 
promotes?" Because these interviews fit a qualitative approach to research, the 
exact wording and sequence of questions varied. It was more important to 
elicit the meanings the assessment had for participants than to standardize 
questions. The interviewees were encouraged to tell their own stories in their 
own words, the researchers using those words to construct probes so that the 
agenda could be addressed. For example, the probe for the question stated 
above might attempt to elicit information on the kinds and amounts of 
professional development the teachers had already experienced or believed to 
be important precursors of ASAP-related instruction. The agenda was drawn 
from the conceptual framework and emerging issues in the study as a whole. 
Interviews were tape-recorded and the tapes transcribed. 

Researchers at the four sites also collected documents and artifacts. For 
example, some teachers voluntarily provided work samples from students in 
ASAP-related activities and journals in which students described their 
reaction to assessments. Curriculum guides, text samples, work sheets and 
instructional packets, detailed samples of district tests and test results, 
information sent to parents, notices of meetings and training sessions, and the 
like also supplemented the observation and interview data. 

Within-Site Data Analysis 

The researchers coded their data according to the categories in the project 
conceptual framework as well as categories emerging from their site. For 
example, every instance of data that plausibly referred to or illustrated a 
teacher's image of the curriculum was so coded for subsequent retrieval. Or, a 
district administrator's contention that district CRTs were a more appropriate 
standard for achievement than ASAP results would have been coded as "image 
of testing." In addition, local issues, such as the conflict among third- and 
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fourth-grade teachers at Franklin about the value of moving to ASAP-like 
instruction, produced the inductively-derived category "Grade-level isolation/ 
conflict. " Researchers were encouraged to use qualitative analysis computer 
programs, such as Ethnograph and Hyperqual, to identify, mark, index, and 
retrieve data that instantiated the categories. They wrote memos periodically 
to define the categories and document their thinking processes as they 
analyzed their data. Finally, they wrote assertions and produced vignettes to 
support the assertions. According to Erickson (1986), assertions are 
statements that researchers inductively derive by reading and re-reading the 
record and data. These statements are inferences about the meaning of the 
evidence. For example, one of the assertions from the study of the Valor site 
follows: "Although performance assessment is meant to encourage the social 
nature of learning, learned attitudes and behaviors (prior knowledge) 
regarding testing persist. Teachers and students respond to the function of 
assessment rather than the form. Testing for teachers and students remains 
a solitary, inactive, and structured experience." Vignettes had two functions: 
to describe a particular slice of life in the setting and to illustrate the basis in 
data from which the assertion was derived (Erickson, 1986). Thus, the vignette 
that accompanied the above-quoted assertion vividly describes how teachers 
prepared for and administered both ASAP and ITBS. The style and t)ne of 
ASAP administration resembled that of ITBS but contrasted with that of 
regular instruction. 

Researchers established the warrant for their assertions by looking 
closely for discontinuing instances, and checking that the assertions had 
sufficient confirming data of varying methods (e.g., observations vs. 
interviews). In addition, drafts of the assertions and vignettes of each case 
study were read by the other researchers, the coordinator and director. 
Revisions were made based on this feedback. Then, the researchers completed 
the case studies (Smith et al., 1994), providing their overall perspective about 
the role of mandated testing in their respective sites. 

Cross-Site Data Analysis 

The existence and use of the conceptual framework for the study as a 
whole, the monthly meetings, and supervision of researchers increased the 
likelihood that the separate case studies would have enough elements in 
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common to enable cross-site analysis. The final meeting of the research team 
to discuss the case studies was tape-recorded to preserve a record of the ideas 
generated. T\is meeting served two analytic purposes. First, each case was 
used as interpretive context for the others. That is, elements that had been 
overlooked in one site became highlighted by comparing cases. For example, 
at Hilldale, district testing was simply not an issue, and the researcher at that 
site had consequently ignored it. At Franklin and Pines, however, the district 
testing program has profound impact on what happens to ASAP-relevant 
instruction. Through this comparison, a hole in the Hilldale account was 
readily identified and rectified. Second, it treated the researchers as 
informants in the sense that, after a year of data collection, they "knew" much 
more about the educational and social context than they could have possibly 
included in the case study. The director and coordinator could then ask them 
to summarize information on issues of cross-site interest. For example, a 
quick reading of data and a few phone calls produced data on the missing 
element from the Hilldale account on the role of district testing. 

The analysis of qualitative data is fundamentally a process of thinking 
and progressive problem solving (Erickson, 1986), with only a crude set of tools 
and procedures. The conceptual framework yielded categories such as Image 
of the Pupil. Data had been gathered that allowed us to generate assertions 
within each site about the Image of the Pupil that seemed to be held by teachers 
and district officials. In addition, we had evolved a set of working hypotheses, 
or plausible accounts and explanations, for how the change in mandated 
testing was working out at each site, that is, what particular barriers and 
facilitating conditions seemed to be responsible for local reactions. 
Furthermore, we understood that audiences for this report would be interested 
in the formal characteristics of each site (e.g., the degree of pupil 
disadvantage) and would need a variety of information to make their own 
interpretations of the data. From these considerations, we developed a set of 
dimensions for the cross-site matrix. Our aim was to provide data in the 
matrix that would reduce the sheer quantity of information to a manageable 
level without resorting to high-level abstractions or losing the sense of 
grounding and authenticity that case studies can provide. 

Based on the above considerations, the Cross-Site Data Matrix was 
constructed. The elements in each cell are short summaries, paraphrases, or 
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characterizations of the particular site on the selected dimensions. These 
characterizations were constructed by the research director and submitted to 
the case study researchers for their substantive and editorial comments. 

The Cross-Site Data Matrix is placed in the Appendix. The dimensions of 
the matrix are as follows: 

DISTRICT CHARACTERISTICS 
Size and Organization (K-8) 
Resource Base 
Organizational Culture 
District Testing Model 
Test Demand (Hi/Low Stakes) 

Knowledge/Commitment of Officials to ASAP-like Instructional 

Principles 
Belief in the Permanence of ASAP 
Image of Pupil 
Image of Teacher 
Image of Assessment 
Image of Curriculum 
Reaction to ASAP Results 
Prospects for Second-Year Changes 

PUPIL CHARACTERISTICS 
SES 

Language Dominance 
Ethnic Composition 

COMMUNITY CHARACTERISTICS 

Size and Type of Community 
SES 

Parent Participation and Interest in Scores 

SCHOOL CHARACTERISTICS 

School Organization and Size 
School Structure 
Grade-level Isolation/Conflict 
Role of Principal 

Principal Accommodation/Resistance 
Curriculum/Texts 
Test Burden 
Test Preparation 

Presence of ASAP Key Gatekeeper 
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TEACHER CHARACTERISTICS (Focal Teachers) 
Experience 

Commitment to Holistic, Thinking Instruction 

Prior Knowledge of ASAP-like Instruction 

Opportunities for Relevant Professional Development 

Familiarity With Performance Test, Rubrics, Essential Skills 

The Professional Life of Teachers 

Image of Pupil 

Image of Teacher 

Image of Curriculum 

Image of Assessment 

Test Preparation (activities engaged in) 

Time to Reflect, Experiment, Collaborate 

Perception of/Reaction to Test Stakes 

Accommodation/Resistance 

The process of arraying data in the Cross-Site Data Matrix stimulated 
further thinking about what elements were most salient in accounting for the 
differences among the cases in response to the mandate. In constructing the 
Analytic Matrix (Figure 1), we started with a working assumption (analyzed 
and critiqued in Noble, 1994) that the ASAP mandate promotes changes toward 
high standards and constructivist education. Furthermore, we knew from the 
findings of the policy study (Noble, 1994) that the state had made no provision 
for capacity building in support of the mandate. Nor had the state attended to 
issues such as delivery standards or opportunity to learn. Thus, this mandate 
is unfunded and the provision of professional development provided by the state 
in support of change was meager or nonexistent. The only state mechanisms 
to instigate the change included the power of the ADE to persuade through 
rhetoric (e.g., repeated reminders to district officials and teachers of the 
importance of the Arizona Essential Skills and of teaching "the way kids 
learn"), the threat of disapproval of the District Assessment Plans, and the 
performance test itself (which was initially perceived to be low-stakes), plus the 
preliminary Forms A, B, and C and workshops to train teachers how to 
administer and score the assessment. Therefore, we recognized that both the 
resources for changing toward the promoted goals and the authority and 
power to change had to be understood at the local rather than state level. 
Based on these assumptions and understandings, we chose four categories 
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that seemed to account for the status of the site at the end of the first year. For 
example, the curriculum and pedagogy at Valor were virtually unchanged 
after one year of the program. No resources were available to direct toward 
ASAP-consonant activities, and thus no capacity was developed. School 
personnel acquiesced to the ASAP requirements, and ASAP merely added to 
the accountability load. Some resistance was evident in the departure of one of 
the constructivist teachers who experienced this burden. The status of change 
can be attributed in part to resources issues, knowledge, assumptive worlds, 
and organizational culture there. 

The categories in the Analytic Matrix are listed and defined as follows. 

• Resources for Change: Material Resources refers to the district's 
financial capacity to purchase or develop curriculum and to offer 
teachers professional development activities consistent with ASAP 
goals. Where financial resources are available, we ask whether they 
are directed at activities consonant or dissonant with ASAP aims. 
Knowledge Resources refers to the presence in the district and school 
of officials and teachers with knowledge and commitment to 
constructivist education and performance testing. Each site was 
characterized according to whether there was some gatekeeper, such 
as principal, coordinator, consultant, or other person who could 
interpret ASAP procedures and help teachers make changes 
consistent with ASAP aims. In some sites, a coordinator had been 
named by the district, but the person lacked knowledge, was 
unavailable to teachers, or soon left the district, and thus failed to help 
teachers make consonant changes. 

• Power to Change: We characterized each site according to its 
organizational culture and where the power exists to make changes at 
the classroom level. For example, a centralized and hierarchical 
district vests control over change at the district level, leaving teachers 
and principals with little discretion to change in contrary directions. 
Local options remaining include acquiescence, accommodations (e.g., 
dis-integrating integrated curriculum or inappropriate test 
preparation), resistance, and marginalization. 

• Assumptive Worlds: In this category we condensed the images of the 
pupil, learning, teacher, and curriculum that seemed to characterize 
both the district and the teachers at each site and the extent to which 
the dominant philosophy was either consonant (i.e., constructivist) or 
dissonant (behaviorist or concrete-sequential) with ASAP aims 
(assuming that ASAP is in fact constructivist). The constructivist 
assumptive world views the pupil as an active meaning-maker, the 
teacher as a coach or partner in meaning-making, and the 
curriculum as thematic, integrated, and negotiated, consistent with 
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pupil interests and prior knowledge. The concrete-sequential 
assumptive world views the pupil as an empty receptacle, teacher as 
conduit of curriculum and imparter of skills, and the curriculum as a 
hierarchical set of standard skills for the pupils to master. 

• Role of Testings This category reflects our characterization of the test 
demand or degree of testing stakes imposed on classrooms at each site. 
We distinguish the perceived function of tests as accountability devices 
(performed for external audiences) rather than as integral parts of 
instruction and whether there is a strong demand for high scores or 
measured change at the site. We also note the degree of test burden 
(proportion of time consumed by various testing functions), the 
expectations at the site for high or low scores based on past history, 
and where ASAP fits into the testing scheme. 

• Year-end Status: This row in the matrix reflects our perspective of 
where each site stands with respect to reactions to the ASAP mandate. 



Conclusions 

What does the multiple case study tell us about the effects of the changing 
test mandate? Variations of local response are both substantial and 
significant. After a year of implementation, ASAP has largely been absorbed 
and subsumed under local, but apparently more salient, concerns. The goal of 
enhancing constructivist, integrated, "thinking" curriculum and pedagogy 
has been addressed most directly only in Hilldale Elementary, a school 
characterized as not only economically advantaged but also already well on its 
way toward ASAP goals before (or independent of) the state mandate. Many 
Hilldale teachers were already trained in and committed to holistic pedagogy, 
having, for example, a literature-based rather than a basal reading program 
and integrated, thematic curriculum. Its principal shares constructivist 
assumptions and acts as an agent of change at both the school and the district. 
The district administrators view performance assessment as "the wave of the 
future." They accept the district responsibility (given that the state had made 
no provision for it) for the professional development of teachers to gain the 
expertise that holistic, integrated, "thinking" education requires. The district 
had financial resources and aimed them toward acquisition of compatible 
materials and professional development. Teachers take courses, seek 
consultants' advice, and participate in staff development workshops, all 
consonant with ASAP aims. There seems to be a culture of teacher 
professionalism (time and a certain degree of autonomy— yet teachers who 
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hold on to behaviorist images are in the minority and marginalized) to advance 
constructivist curriculum and pedagogy. Several key teachers have made it 
their responsibility to serve on committees and take workshops related to 
ASAP and scoring rubrics. Although there is an intense interest in high 
scores among district officials and parents, the history of high test scores and 
awards at Hilldale provides Hilldale's principal with a degree of autonomy 
probably not experienced by every school in the district. Even so, the 
organizational culture in the district is decentralized, with power to make 
changes diffused among the schools. District administrators provide impetus 
to change through the power of persuasion and capacity building. The 
criterion-referenced testing program previously used in the district has been 
abandoned in favor of portfolios and ASAP Forms A, B, and C. Thus, a further 
barrier against change toward constructivist education has been removed. 
Nevertheless, even Hilldale teachers recognize that ASAP serves an 
accountability function, and they direct attention to the test as a test and to 
what aspects of the test will pay off in high scores. 

Contrast these characteristics with those of Pines. Pines' economic 
resources are equal to those of Hilldale. Yet few resources were aimed at 
acquiring materials or training its teachers in support of ASAP aims. 
Teachers who themselves are supportive of those goals are torn between 
pursuing those goals and satisfying district requirements. The culture of the 
district is centralized and policy-driven, specifying almost every curricular 
decision. It is backed by a prescriptive, district criterion-referenced testing 
program and strong demand for high scores on thos^ tests. Once district 
curricular and testing requirements are met, there is very little time and 
energy left for teachers to pursue alternative instruction. Teachers acquiesce 
to district images of curriculum, instruction, pupil characteristics, and 
assessment, or else resist by leaving the environment. Because Pines is a 
relatively low-scoring school in a district that scores high and whose 
administrators and parents insist on high scores, the accountability pressures 
are extreme. Teachers' evaluations and principal's positions are perceived to 
be on the line. In this set of circumstances, ASAP-related goals seem 
relatively remote and irrelevant to teachers' concerns. ASAP adds to the 
accountability burden. Teachers accommodate by disintegrating and focusing 
attention on what will be scored, but make few changes toward constructivist 
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education. At the end of one year, little capacity has been created that could 
logically lead to authentic changes. 

At Franklin, the degree of disadvantage of the pupil population is central 
to teachers' and administrati ~rJ rejection of ASAP. The official view of pupils 
is that they come to school as empty vessels that must be filled, a drop at a 
time, with skills. These skills are considered to be hierarchically arranged so 
that higher order thinking or problem solving can only be pursued once basic 
skills are mastered. Thus, ASAP, which requires integration of, for example, 
reading and writing with mathematical calculation, is viewed as beyond the 
reach of Franklin's pupils, who are almost exclusively poor, minority, and 
limited English-speaking. As at Pines the majority and official view silences 
the few teachers who might think differently. Materials and district tests that 
emphasize the concrete-sequential curriculum and behaviorist pedagogy 
sustain the dominant images. There is a high demand for demonstrated 
growth and high scores on the district criterion-referenced tests. The tests are 
the curriculum, in fact, in that there are very few instructional transactions 
outside the scope of the tests. District administrators define "master teachers" 
as those whose students get high scores. In a classic recreation of Taylorism, 
the principal designates master teachers to design instructional packets they 
have found to be successful for attaining high scores and dispenses those to the 
other teachers. District CRTs are constructed by teachers and considered o be 
the only measurement that suits this population— not ITBS and certainly not 
ASAP. Though Franklin has sufficient economic resources to modify its 
instruction and train its teachers toward ASAP aims, there is little chance 
that it will do so, so powerful is its culture to the contrary. By the end of the 
first year, the only changes evident are a passive acquiescence to the added 
accountability burden of ASAP and accommodation by dis-integrating and 
focusing on scores. 

Valor matches Franklin in the degree of disadvantage of its pupils, yet its 
rural, agricultural economy impoverishes district resources, making 
modification of local curriculum, instruction, and teacher training 
problematic. Also like Franklin, its pupils score low on standardized tests, but 
in contrast, there is relatively little pressure on teachers to raise those scores, 
either on the state-mandated ITBS or on local criterion-referenced tests. 
Valor's organizational climate is laissez-faire. Curriculum decisions are not 
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driven centrally. Teachers have a degree of autonomy greater than the other 
three sites. The focal teachers observed in this site, therefore, varied among 
themselves in their images of pupil, teacher, curriculum, and assessment, 
some consistent and others inconsistent with ASAP-related aims. What 
overwhelmed culture and image at Valor, however, was the limitation in 
resources. Textbooks were twenty years old and incompatible with ASAP. The 
district's purchase in the 1980s of an off-the-shelf criterion-referenced testing 
program (in format similar to the ITBS) represented such a substantial 
investment that it is unlikely to afford a new one, more fitted with process and 
integrated performance assessment. There was no individual in the school or 
district who could interpret state images or inform teachers about what needed 
to be done to adapt to ASAP. When the time came to administer ASAP, 
teachers struggled with its complexities, showing most clearly how teachers' 
prior knowledge of an instructional activity must be taken into account if a 
mandating agency expects that activity to succeed. Valor teachers, though 
competent to teach what was familiar to them, had never experienced the use 
of writing to teach reading, for example, or how to teach estimation in 
mathematics by referring to familiar objects. The aim of "thinking education" 
is pupil understanding and integration of new knowledge by referring to prior 
knowledge. The aim of concrete-sequential education, is to repeat an activity 
until the pupils "get it right," as opposed to "getting it." But the Valor teachers' 
own prior knowledge was an inadequate scaffold to hold the holistic, integrated 
teaching, learning, and assessment model promoted by ASAP. When 
presented with an integrated unit in Form A or in the new social studies text 
the district adopted, the teachers actually "dis-integrated" it. That is ; they 
decomposed the lesson into bits that they thought could be taught in such a way 
that all the pupils could perform correctly and get the right answer. There 
was no money for in-service training that might have helped the teachers 
make the change. 

What was common among the sites was the belief that testing that comes 
from an outside agency is still testing, with its attendant considerations that 
testing must be individualistic, competitive, silent, and objectively scored. 
Testing that is done for outside agencies is still separate from instruction and 
added on to normal school activities. Assessment from the teachers' point of 
view is what advances instruction day-to-day, for which they have multiple 
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indicators besides test results. This view of ASAP as an add-on, done to satisfy 
an external audience, contradicts the state policy image that ASAP testing 
should be integrated with instruction rather than be a supplement. The state 
image implies that local curriculum not consistent with the Arizona Essential 
Skills and ASAP assessment should wither away. But teachers and principals 
orient themselves more to local demands and see state requirements as an 
unwarranted intrusion or unlikely to persist. 

Across the sites, teachers viewed ASAP as low-stakes and aimed more to 
change instruction than to evaluate the efficacy of schools. Some even 
regarded ASAP as a pilot or experiment. The exception was at Hilldale, where 
teachers were more knowledgeable about how the ADE intended to conduct the 
on-demand Form D assessment and about the scoring rubrics that would be 
applied to the performance test results. The focal teacher, Terri, who was the 
ASAP liaison for Hilldale, directed the attention of her pupils to those features 
of the performance assessment that would, in fact, be scored and the attention 
of her colleagues to the accountability function that ASAP was likely to serve. 

The view of ASAP as a low-stakes test designed to nudge districts toward a 
different form of pedagogy was overturned when, in the spring of 1993, the 
ADE reported ASAP scores by school and grade level, in the same manner that 
it usually reports ITBS results. At that point, more practitioners and 
administrators viewed ASAP as part of the state's accountability package. By 
that point as well, districts began struggling with the notion of setting a cut-off 
score on the performance test that would demonstrate to the ADE the districts' 
mastery of the Essential Skills measured by each assessment. 

This study has shown how the actions of practitioners are far from 
uniform in response to a policy mandate. Local interpretations and 
organizational norms intervened to color, distort, delay, enhance, or thwart 
the intentions of the policy and the policy-shaping community. 

It is, however, only the story of the first year of implementation of a 
measurement-driven reform, under perceived low-stakes conditions. The 
proponents of such reform might be heartened by the prospects of change 
under conditions of increased stakes, brought along by the ADE's publicizing 
school scores, attaching mastery levels to the performance test, and 
attempting to make grade promotion and high school graduation related to 
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performance on ASAP. Such ratcheting of stakes may increase educatois' 
attention to changing instruction in the desired direction. Or, the reaction 
may be to do what is necessary to increase the scores themselves, as the 
literature on dysfunctional side effects of accountability suggests (Campbell, 
1979). 

In either case, the prospects for reform toward the aims of the mandate 
must be judged in light of one notable barrier, the variable status of teachers' 
expertise or prior knowledge of holistic, integrated, thinking curriculum and 
pedagogy. Hilldale teachers have reported that it took years of expert 
guidance, and time to experiment, reflect, and collaborate, once they 
personally made the commitment to change in this direction. No institutional 
obstacle was placed in their path. The distance on this dimension between 
Hilldale and the other schools we studied is vast. The means for schools to 
traverse this distance have been ignored in policy formation and 
administration, or left to the vagaries of district and school practice. 
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